52 research outputs found
Model-based image analysis for forensic shoe print recognition
This thesis is about automated forensic shoe print recognition. Recognizing a shoe print
in an image is an inherently difficult task. Shoe prints vary in their pose, shape and
appearance. They are surrounded and partially occluded by other objects and may
be left on a wide range of diverse surfaces. We propose to formulate this task in a
model-based image analysis framework.
Our framework is based on the Active Basis Model. A shoe print is represented as
hierarchical composition of basis filters. The individual filters encode local information
about the geometry and appearance of the shoe print pattern. The hierarchical com-
position encodes mid- and long-range geometric properties of the object. A statistical
distribution is imposed on the parameters of this representation, in order to account for
the variation in a shoe print‘s geometry and appearance.
Our work extends the Active Basis Model in various ways, in order to make it robustly
applicable to the analysis of shoe print images. We propose an algorithm that automat-
ically infers an efficient hierarchical dependency structure between the basis filters. The
learned hierarchical dependencies are beneficial for our further extensions, while at the
same time permitting an efficient optimization process. We introduce an occlusion model
and propose to leverage the hierarchical dependencies to integrate contextual informa-
tion efficiently into the reasoning process about occlusions. Finally, we study the effect
of the basis filter on the discrimination of the object from the background. In this con-
text, we highlight the role of the hierarchical model structure in terms of combining the
locally ambiguous filter response into a sophisticated discriminator.
The main contribution of this work is a model-based image analysis framework which
represents a planar object‘s variation in shape and appearance, it‘s partial occlusion as
well as background clutter. The model parameters are optimized jointly in an efficient
optimization scheme. Our extensions to the Active Basis Model lead to an improved
discriminative ability and permit coherent occlusions and hierarchical deformations. The
experimental results demonstrate a new state of the art performance at the task of
forensic shoe print recognition
COMPAS: Representation Learning with Compositional Part Sharing for Few-Shot Classification
Few-shot image classification consists of two consecutive learning processes:
1) In the meta-learning stage, the model acquires a knowledge base from a set
of training classes. 2) During meta-testing, the acquired knowledge is used to
recognize unseen classes from very few examples. Inspired by the compositional
representation of objects in humans, we train a neural network architecture
that explicitly represents objects as a set of parts and their spatial
composition. In particular, during meta-learning, we train a knowledge base
that consists of a dictionary of part representations and a dictionary of part
activation maps that encode common spatial activation patterns of parts. The
elements of both dictionaries are shared among the training classes. During
meta-testing, the representation of unseen classes is learned using the part
representations and the part activation maps from the knowledge base. Finally,
an attention mechanism is used to strengthen those parts that are most
important for each category. We demonstrate the value of our compositional
learning framework for a few-shot classification using miniImageNet,
tieredImageNet, CIFAR-FS, and FC100, where we achieve state-of-the-art
performance
Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model
Amodal completion is a visual task that humans perform easily but which is
difficult for computer vision algorithms. The aim is to segment those object
boundaries which are occluded and hence invisible. This task is particularly
challenging for deep neural networks because data is difficult to obtain and
annotate. Therefore, we formulate amodal segmentation as an out-of-task and
out-of-distribution generalization problem. Specifically, we replace the fully
connected classifier in neural networks with a Bayesian generative model of the
neural network features. The model is trained from non-occluded images using
bounding box annotations and class labels only, but is applied to generalize
out-of-task to object segmentation and to generalize out-of-distribution to
segment occluded objects. We demonstrate how such Bayesian models can naturally
generalize beyond the training task labels when they learn a prior that models
the object's background context and shape. Moreover, by leveraging an outlier
process, Bayesian models can further generalize out-of-distribution to segment
partially occluded objects and to predict their amodal object boundaries. Our
algorithm outperforms alternative methods that use the same supervision by a
large margin, and even outperforms methods where annotated amodal segmentations
are used during training, when the amount of occlusion is large. Code is
publically available at https://github.com/YihongSun/Bayesian-Amodal
Empirically Analyzing the Effect of Dataset Biases on Deep Face Recognition Systems
It is unknown what kind of biases modern in the wild face datasets have
because of their lack of annotation. A direct consequence of this is that total
recognition rates alone only provide limited insight about the generalization
ability of a Deep Convolutional Neural Networks (DCNNs). We propose to
empirically study the effect of different types of dataset biases on the
generalization ability of DCNNs. Using synthetically generated face images, we
study the face recognition rate as a function of interpretable parameters such
as face pose and light. The proposed method allows valuable details about the
generalization performance of different DCNN architectures to be observed and
compared. In our experiments, we find that: 1) Indeed, dataset bias has a
significant influence on the generalization performance of DCNNs. 2) DCNNs can
generalize surprisingly well to unseen illumination conditions and large
sampling gaps in the pose variation. 3) Using the presented methodology we
reveal that the VGG-16 architecture outperforms the AlexNet architecture at
face recognition tasks because it can much better generalize to unseen face
poses, although it has significantly more parameters. 4) We uncover a main
limitation of current DCNN architectures, which is the difficulty to generalize
when different identities to not share the same pose variation. 5) We
demonstrate that our findings on synthetic data also apply when learning from
real-world data. Our face image generator is publicly available to enable the
community to benchmark other DCNN architectures.Comment: Accepted to CVPR 2018 Workshop on Analysis and Modeling of Faces and
Gestures (AMFG
Informed MCMC with Bayesian Neural Networks for Facial Image Analysis
Computer vision tasks are difficult because of the large variability in the
data that is induced by changes in light, background, partial occlusion as well
as the varying pose, texture, and shape of objects. Generative approaches to
computer vision allow us to overcome this difficulty by explicitly modeling the
physical image formation process. Using generative object models, the analysis
of an observed image is performed via Bayesian inference of the posterior
distribution. This conceptually simple approach tends to fail in practice
because of several difficulties stemming from sampling the posterior
distribution: high-dimensionality and multi-modality of the posterior
distribution as well as expensive simulation of the rendering process. The main
difficulty of sampling approaches in a computer vision context is choosing the
proposal distribution accurately so that maxima of the posterior are explored
early and the algorithm quickly converges to a valid image interpretation. In
this work, we propose to use a Bayesian Neural Network for estimating an image
dependent proposal distribution. Compared to a standard Gaussian random walk
proposal, this accelerates the sampler in finding regions of the posterior with
high value. In this way, we can significantly reduce the number of samples
needed to perform facial image analysis.Comment: Accepted to the Bayesian Deep Learning Workshop at NeurIPS 201
Compositional Convolutional Neural Networks: A Deep Architecture with Innate Robustness to Partial Occlusion
Recent findings show that deep convolutional neural networks (DCNNs) do not
generalize well under partial occlusion. Inspired by the success of
compositional models at classifying partially occluded objects, we propose to
integrate compositional models and DCNNs into a unified deep model with innate
robustness to partial occlusion. We term this architecture Compositional
Convolutional Neural Network. In particular, we propose to replace the fully
connected classification head of a DCNN with a differentiable compositional
model. The generative nature of the compositional model enables it to localize
occluders and subsequently focus on the non-occluded parts of the object. We
conduct classification experiments on artificially occluded images as well as
real images of partially occluded objects from the MS-COCO dataset. The results
show that DCNNs do not classify occluded objects robustly, even when trained
with data that is strongly augmented with partial occlusions. Our proposed
model outperforms standard DCNNs by a large margin at classifying partially
occluded objects, even when it has not been exposed to occluded objects during
training. Additional experiments demonstrate that CompositionalNets can also
localize the occluders accurately, despite being trained with class labels
only. The code used in this work is publicly available.Comment: CVPR 2020; Code is available
https://github.com/AdamKortylewski/CompositionalNets; Supplementary material:
https://adamkortylewski.com/data/compnet_supp.pd
General Neural Gauge Fields
The recent advance of neural fields, such as neural radiance fields, has
significantly pushed the boundary of scene representation learning. Aiming to
boost the computation efficiency and rendering quality of 3D scenes, a popular
line of research maps the 3D coordinate system to another measuring system,
e.g., 2D manifolds and hash tables, for modeling neural fields. The conversion
of coordinate systems can be typically dubbed as gauge transformation, which is
usually a pre-defined mapping function, e.g., orthogonal projection or spatial
hash function. This begs a question: can we directly learn a desired gauge
transformation along with the neural field in an end-to-end manner? In this
work, we extend this problem to a general paradigm with a taxonomy of discrete
& continuous cases, and develop an end-to-end learning framework to jointly
optimize the gauge transformation and neural fields. To counter the problem
that the learning of gauge transformations can collapse easily, we derive a
general regularization mechanism from the principle of information conservation
during the gauge transformation. To circumvent the high computation cost in
gauge learning with regularization, we directly derive an information-invariant
gauge transformation which allows to preserve scene information inherently and
yield superior performance.Comment: ICLR 202
- …